Giraphx: Parallel Yet Serializable Large-Scale Graph Processing
نویسندگان
چکیده
Bulk Synchronous Parallelism (BSP) provides a good model for parallel processing of many large-scale graph applications, however it is unsuitable/inefficient for graph applications that require coordination, such as graph-coloring, subcoloring, and clustering. To address this problem, we present an efficient modification to the BSP model to implement serializability (sequential consistency) without reducing the highlyparallel nature of BSP. Our modification bypasses the message queues in BSP and reads directly from the worker’s memory for the internal vertex executions. To ensure serializability, coordination is performed— implemented via dining philosophers or token ring— only for border vertices partitioned across workers. We implement our modifications to BSP on Giraph, an open-source clone of Google’s Pregel. We show through a graph-coloring application that our modified framework, Giraphx, provides much better performance than implementing the application using dining-philosophers over Giraph. In fact, Giraphx outperforms Giraph even for embarrassingly parallel applications that do not require coordination, e.g., PageRank.
منابع مشابه
BPP: Large Graph Storage for Efficient Disk Based Processing
Processing very large graphs like social networks, biological and chemical compounds is a challenging task. Distributed graph processing systems process the billion-scale graphs efficiently but incur overheads of efficient partitioning and distribution of the graph over a cluster of nodes. Distributed processing also requires cluster management and fault tolerance. In order to overcome these pr...
متن کاملChallenges in Parallel Graph Processing
Graph algorithms are becoming increasingly important for solving many problems in scientific computing, data mining and other domains. As these problems grow in scale, parallel computing resources are required to meet their computational and memory requirements. Unfortunately, the algorithms, software, and hardware that have worked well for developing mainstream parallel scientific applications...
متن کاملExplore Efficient Data Organization for Large Scale Graph Analytics and Storage
Many Big Data analytics essentially explore the relationship among interconnected entities, which are naturally represented as graphs. However, due to the irregular data access patterns in the graph computations, it remains a fundamental challenge to deliver highly efficient solutions for large scale graph analytics. Such inefficiency restricts the utilization of many graph algorithms in Big Da...
متن کاملLarge-Scale Graph Analytics in Aster 6: Bringing Context to Big Data Discovery
Graph analytics is an important big data discovery technique. Applications include identifying influential employees for retention, detecting fraud in a complex interaction network, and determining product affinities by exploiting community buying patterns. Specialized platforms have emerged to satisfy the unique processing requirements of large-scale graph analytics; however, these platforms d...
متن کاملA Scalable Parallel Force-Directed Graph Layout Algorithm
Understanding the structure, dynamics, and evolution of large graphs is becoming increasingly important in a variety of fields. The demand for visual tools to aid in this process is rising accordingly. Yet, many algorithms that create good representations of small and medium-sized graphs do not scale to larger graph sizes. The exploitation of the massive computational power provided by parallel...
متن کامل